Exploratory tools for clustering multivariate data

نویسندگان

  • Anthony C. Atkinson
  • Marco Riani
چکیده

The forward search provides a series of robust parameter estimates based on increasing numbers of observations. The resulting series of robustMahalanobis distances is used to clustermultivariate normal data.Themethod depends on envelopes of the distribution of the test statistics in forward plots.These envelopes can be found by simulation; flexible polynomial approximations to the envelopes are given. New graphical tools providemethods not only of detecting clusters but also of determining their membership. Comparisons are made with mclust and k-means clustering. © 2007 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interpretability and Informativeness of Clustering Methods for Exploratory Analysis of Clinical Data

Clustering methods are among the most commonly used tools for exploratory data analysis. However, using clustering to perform data analysis can be challenging for modern datasets that contain a large number of dimensions, are complex in nature, and lack a ground-truth labeling. Traditional tools, like summarization and plotting of clusters, are of limited benefit in a high-dimensional setting. ...

متن کامل

Incorporating Density Estimationinto Other Exploratory

Preliminary understanding of a new data set is routinely accomplished with graphical tools, such as those popularized originally by EDA. A number of more recent ideas for multivariate data analysis have emerged and some are available in software packages or shareware such as XGobi. In this talk, we illustrate how many of the point-oriented techniques can be supplemented by incorporating nonpara...

متن کامل

Model-Based Clustering and Classification of Functional Data

The problem of complex data analysis is a central topic of modern statistical science and learning systems and is becoming of broader interest with the increasing prevalence of highdimensional data. The challenge is to develop statistical models and autonomous algorithms that are able to acquire knowledge from raw data for exploratory analysis, which can be achieved through clustering technique...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

A growth curve approach to analyzing multiple-valued expression data

There is significant literature which explores methods for clustering timeseries gene-expression data sets, such as the classical data set due to Spellman et al. (1998). For instance James and Hastie (2001) use linear or quadratic discriminant functions on fitted curves, while Bar-Joseph et al. (2003) using a similar approach, do the clustering based on the coefficients of the fitted splines. I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2007